In order to retrieve unlabeled images by textual queries, cross-mediasimilarity computation is a key ingredient. Although novel methods arecontinuously introduced, little has been done to evaluate these methodstogether with large-scale query log analysis. Consequently, how far have thesemethods brought us in answering real-user queries is unclear. Given baselinemethods that compute cross-media similarity using relatively simple text/imagematching, how much progress have advanced models made is also unclear. Thispaper takes a pragmatic approach to answering the two questions. Queries areautomatically categorized according to the proposed query visualness measure,and later connected to the evaluation of multiple cross-media similarity modelson three test sets. Such a connection reveals that the success of thestate-of-the-art is mainly attributed to their good performance onvisual-oriented queries, while these queries account for only a small part ofreal-user queries. To quantify the current progress, we propose a simpletext2image method, representing a novel test query by a set of images selectedfrom large-scale query log. Consequently, computing cross-media similaritybetween the test query and a given image boils down to comparing the visualsimilarity between the given image and the selected images. Image retrievalexperiments on the challenging Clickture dataset show that the proposedtext2image compares favorably to recent deep learning based alternatives.
展开▼